Data Allocation Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-dimensional Grids
نویسندگان
چکیده
We study the implementation of dense linear algebra computations, such as matrix multiplication and linear system solvers, on two-dimensional (2D) grids of heterogeneous processors. For these operations, 2D-grids are the key to scalability and eÆciency. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these operations on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous 2D-grids with respect to the performance of the processors. The practical usefulness of these strategies is fully demonstrated by experimental data for a heterogeneous network of workstations.
منابع مشابه
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
ÐIn this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor...
متن کاملLoad Balancing Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-Dimensional Grids
متن کامل
Data parallel scheduling of Operations in Linear Algebra on heterogeneous clusters
The aim of data and task parallel scheduling for dense linear algebra kernels is to minimize the processing time of an application composed by several linear algebra kernels. The scheduling strategy presented here combines the task parallelism used when scheduling independent tasks and the data parallelism used for linear algebra kernels. This problem has been studied for scheduling independent...
متن کاملExposing Inner Kernels and Block Storage for Fast Parallel Dense Linear Algebra Codes⋆
Efficient execution on processors with multiple cores requires the exploitation of parallelism within the processor. For many dense linear algebra codes this, in turn, requires the efficient execution of codes which operate on relatively small matrices. Efficient implementations of dense Basic Linear Algebra Subroutines exist (BLAS libraries). However, calls to BLAS libraries introduce large ov...
متن کاملAccelerating GPU Kernels for Dense Linear Algebra
Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corresponding routines from currently available libraries for GPUs. In particular, Pointer Redirecting – a set of GPU specific optimiz...
متن کامل